The central research question that this data analysis tries to answer is whether certain political or religious ideologies are more violent than others.
To that end, we propose the following Hypothesis
H1 -> There is a statistically significant difference between the number of fatalities registered for groups with different ideologies.
We operationalise the independent variable of ideology by using as a proxy the presence of strings in actor names that can be used to mark a certain ideology. We then end up with 7 political and religious ideologies: Christian, Islam, Ethnic-based, Clan-based, Revolutionary, Republican, Democratic and Liberationary groups.
We operationalise the measurement of violence for each ideology based on a proxy which measures the number of fatalities for each conflict registered. This will serve as our dependent variable.
We load the ACLED data of Africa 1997 - 2016 and 2017 as well as for Asia 2015 - 2017 into R (For all codes not included in this output, see the attached markdown file).
We then merge the data into one data frame.
In a next step, we assign a region variable to every observation.
In order to analyze ideologies of actors in later steps, we need to transform our data frame from a wide into a long format (i.e. into a monadic file). In the resulting data frame, each observation is one actor and every event appears twice.
We use regular expressions and the grepl() function to find those ideologies in actor names that we are interested in, namely Christian, Islam, Ethnic-based, Clan-based, Revolutionary, Republican, Democratic and Liberationary. We then extract them as a new variable Actor_Ideology measuring the ideology of the group in question for every actor involved in a conflict.
In order to be able to work with the variable EVENT_TYPE, we transform it into factor variable. Note that the data includes some spelling error in EVENT_TYPE that lead to double entries of two categories. We account for this by first incorporating it in the transformation to a factor variable and then renaming the respective category so we yield the correct amount of categories.
We deactivate scientific notation to create more intuitive plot labels.
Some summary statistics give us a first picture of our data.
We can see that the mean value of FATALTIIES is 3.75. For the standard deviation, we get a rather large value of over 70, which points to substantial variation within our data. Given that the standard deviation is much higher than the average, we need to also account for this in our model. Moreover, we compute the range of FATALTIIES (minimum and maximum values). We yield a very large range in FATALTIIES, where the minimum value is 0 and the maximum value is 25000.
| vars | n | mean | sd | min | max | range | se | |
|---|---|---|---|---|---|---|---|---|
| X1 | 1 | 193856 | 3.752363 | 70.36753 | 0 | 25000 | 25000 | 0.1598206 |
Finally, we calculate the proportion of zero FATALTITIES in our data and check if there are any missing values. We find that over 70% of observations have zero FATALITIES. This is an important finding to keep in mind for our model assumptions. However, there are no missing values in FATALTITIES.
| proportion_zerofatalities | NA_FATALITIES_Count |
|---|---|
| 0.723372 | 0 |
Looking at the distribution of FATALITIES (frequency polygon), we can see that it is highly skewed as a result of a high share of zero fatalities and large outliers.
To get a better picture of the usual values of fatalities, we restrict the data for the frequency polygon to 30 fatalities. Only 2872 observations are above that threshold.
We subset the distribution of FATALITIES by EVENT-TYPE, region and Actor_Ideology. Distributions are largely similar across any category and show the same level of skewness, large share of zeros and many large outliers.
We first clean the long data set by removing the unspecific actor ideologies, i.e. “Other”, “Unidentified” and “None-Civilian” and then display the distribution of FATALITIES by the relevant categories of ACTOR_IDEOLOGY.
We now take a look at the distribution of ACTOR_IDEOLOGY.
Actors with ethnic, liberation and democratic ideology are responsible for the highest number of conflict incidences. Groups who are associated with Islam, Revolutionary or Clan follow in terms of the amount of conflicts. Republican and Christian groups exhibit very few incidents of conflict.
We subset the distribution of Actor_Ideology by region in order to analyze potential regional variance. Please note that low counts in South-Eastern Asia and Southern Asia are mainly attributable to the lack of data for these regions before 2015. We can see that Clan ideology is particularly present in Eastern Africa, whereas Liberation and Islam are the major actor ideologies in Northern Africa. In Western Africa, revolutionary ideology is prominent. Southern Africa does not show any pattern with respect to ideology. Moroever, Islam is the most important ideology for South-Eastern Asia and to lesser extent for Southern Asia.
We continue with an analysis of the distribution of EVENT_TYPE.
We first clean the data to remove missing values in EVENT_TYPE. Then, we plot the counts of EVENT_TYPE to get a better picture of the distribution of this variable.
We see that most conflicts are “riots/protests” (around 73 000 times), followed by “Violence against civilians” and “Battles that do not involve a change of territory” (around 45000 respectively). “Headquarters or base established”, “Battle-Government regains territory”, “Non-violent transfer of territory” and “Battle-Non state actor overtakes territory” produce relatively little victims.
We take a more in-depth look by stacking the former graph by regions. Please note again, that Asia is significantly underrepresented in the data. Hence, low counts are at least partly attributed to this fact. However, we can still see that “Riots/Protest” are a particularly frequent event type for Southern Asia. Moreover, violence against civilians occurs comparatively often in Eastern Africa and Northern Africa. Moreover, “Battle-no change of territory” is dominated by incidents of conflict in Eastern Africa.
Looking at the overall regional distribution for all years, not taking into account that data for Asia is not available before 2015, Eastern Africa exhibits the most incidents of conflicts at 57 700. Northern Africa (ca. 40 000), Western Africa (ca. 27 000) and Eastern Africa (ca. 21 000) also demonstrate relatively high numbers of incidents. Surprisingly, Southern Asia has the third largest number of conflict incidents (ca. 34 000), even though data on this region is only available for the years 2015 to 2017.
In order to get a first idea of covariation between our dependent variable and relevant independent variables, we conduct the following analysis.
The boxplot shows very similar distributions across event types. Median is at 0 for all event types. Violence against civilians has a particularly large outlier at 25 000 fatalities and Battle-No Change of territory has an outlier at around 6000.
Filtering out the two outliers, gives the following boxplot, that allows to show more variation within the categories. The skewness of fatalities is clearly visible with most values at or around zero and many large outliers present.
Event types “Violence against civilians” and “Battle-No change of territory” have most fatalities. However, these are also the two event types with the largest outliers. Other event types exhibit comparatively few victims.
The boxplot shows very similar distributions across regions. Median is at 0 for all regions. The skewness of fatalities is clearly visible with most values at or around zero. Besides Southern Asia, South-Eastern Asia and Southern Africa, all regions exhibit large outliers.
A bar graph demonstrating the relationship between region and FATALITIES shows us that Eastern Africa (ca. 23 000) and Middle Africa (ca. 25 000) have most fatalities. Northern Africa also features relatively high numbers of fatalities. Southern Africa has very few victims. For Asian regions numbers are also low, but data constraints need to be taken into account here.
We create boxplots with no limit on fatalities, limit at 1400 and limit at 30 fatalities.
In the boxplot with no limit on Actor_Ideology, we see that the largest outlier in our data (25000 fatalities) is within the Liberation ideology.
Limiting our results at 1400 to exclude this outlier, the boxplot shows very similar distributions across Actor_Ideology. Particularly large outliers exist for the Liberation, Islam, Ethnic and Democratic Ideologies.
Taking an even closer look by limiting our boxplot to 30 fatalities, we can see that for the ideologies Islam, Ethnic, Clan and Christian, the median of fatalities is non-zero. In general, IQRs also vary across ideologies.
Looking at a barchart, we see that Liberation has most fatalities, followed by Ethnic and Islam.
We analzye the development of conflict incidents and their fatalities over time.
The evolution of conflict incidences shows a clear upward trend. However, the sudden increase in 2015 is due to extra data for Asia only available after that time.
We take this into account by looking separately at the evolution in Africa and Asia. For Africa only, this yields the following time series (1997 - 2017) with similar upward trend as before. For Asia, we still see an upward trend, but not as strong as with the Africa data.
We can see that the number of fatalities over time remains almost stable. While fluctuations from one year to the other remain relatively small, 1997 (Killing of 25 000 Hutu refugees in DRC) and 1999 (War Ethiopia against Eritrea) stand out with big sudden increases in fatalities.